Fixing Boundary Violations

3.2 ML Estimation with Constrained Optimization

To solve the ${{P}_{I}}$ , we first add Lagrange multipliers to incorporate the inequality constraints and modify the objective function into the Lagrangian

$\begin{equation*} l\left( \boldsymbol{\gamma} ;\boldsymbol{\lambda} \right)=f\left( \boldsymbol{\gamma}\right) +\boldsymbol{{\lambda }}^{T}c\left( \boldsymbol{\gamma} \right). \end{equation*}$
If optimality is achieved, a solution should exist for

$\boldsymbol{\gamma_{*}}$ and

$\boldsymbol{\lambda_{*}}$ and satisfy the KKT (Karush-Kuhn-Tucker) conditions. (Kuhn and Tucker, 1951)

$\begin{equation*} \left( KKT \right) \begin{cases} \left( a \right)\nabla f\left( \boldsymbol{\gamma_{*}} \right)+A{{\left( \boldsymbol{\gamma_{*}} \right)}^{T}} \boldsymbol{\lambda_{*}}=0 &\text{(Stationarity)} \\ \left( b \right){{c}_{I}}\left( \boldsymbol{\gamma_{*}} \right)\le 0 &\text{(Primal feasibility)} \\ \left( c \right){{\left( \boldsymbol{\lambda_{*}} \right)}_{I}}\ge 0 &\text{(Dual feasibility)} \\ \left( d \right){{\left( \boldsymbol{\lambda_{*}} \right)}_{I}}^{T}{{c}_{I}}\left( \boldsymbol{\gamma_{*}} \right)=0 &\text{(Complementary slackness)}, \end{cases} \tag{3.2} \end{equation*}$
where

$\nabla f\left( \boldsymbol{{\gamma }_{*}} \right)={\partial l\left( \boldsymbol{\gamma} ;\boldsymbol{\lambda} \right)}/{\partial \boldsymbol{\gamma }}\;$ ,

$A\left( \boldsymbol{\gamma} \right)={\partial {{c}_{I}} \left( \boldsymbol{\gamma} \right)}/{\partial \boldsymbol{\gamma} }\;$ .

By linearizing (3.2)¹⁰ and replacing $*$ with $k$ , we can derive a modified system of KKT conditions. (Bonnans et al., 2006, 257)

$\begin{equation*} \begin{cases} \boldsymbol{{{L}_{k}}d}+\boldsymbol{{{A}_{k}}}^{T} \boldsymbol{{\lambda }^{QP}}=-\nabla \boldsymbol{{f}_{k}} \\ {{\left( \boldsymbol{{c}_{k}}+\boldsymbol{{{A}_{k}}d} \right)}_{I}}=0 \\ {{\left( \boldsymbol{{\lambda }^{QP}} \right)}_{I}}\ge 0 \\ {{\left( \boldsymbol{{\lambda }^{QP}} \right)}_{I}}^{T}{{\left( \boldsymbol{{c}_{k}}+\boldsymbol{{{A}_{k}}d} \right)}_{I}}=0, \end{cases} \tag{3.3} \end{equation*}$
where the KKT conditions are satisfied at the

$k$ th iteration,

$\boldsymbol{{\lambda}^{QP}}: =\boldsymbol{{\lambda }_{k}}+\boldsymbol{\mu}$ , and

$\boldsymbol{{L}_{k}}={{\left\{ {{\partial }^{2}}l\left( \boldsymbol{\gamma} ;\boldsymbol{\lambda} \right)/\partial \boldsymbol{\gamma} \partial {{\boldsymbol{\gamma}}^{T}}\; \right\}}_{k}}$ .

The step parameter $\boldsymbol{d}$ is the updated estimate for $\boldsymbol{\gamma}$ at the $k$ th iteration in the numerical analysis of ${{P}_{I}}$ . The Lagrange multiplier $\boldsymbol{{\lambda }^{QP}}$ is the estimate of $\boldsymbol{\lambda}+\boldsymbol{\mu}$ at the $k$ th iteration. The modified system (3.3) is, in fact, the optimality system of the
osculating quadratic problem (QP). (Bonnans et al., 2006, 218)

$\begin{equation*} \begin{cases} {{\min }_{d}}\nabla f{{\left( \boldsymbol{{\gamma}_{k}} \right)}^{T}}\boldsymbol{d}+\frac{1}{2} {{\boldsymbol{d}}^{T}} \boldsymbol{{{L}_{k}}d} \\ {{c}_{I}}\left( \boldsymbol{{\gamma}_{k}} \right)+{{A}_{I}}\left( \boldsymbol{{\gamma}_{k}} \right) \boldsymbol{d}\le 0 \end{cases} \tag{3.4} \end{equation*}$

The method described above is the sequential quadratic programming (SQP) algorithm by which we break down a nonlinear constrained optimization problem into a series of osculating quadratic problems. The sequence of the solutions $\left( \boldsymbol{{\gamma}_{k}},\boldsymbol{{\lambda }^{QP}} \right)$ comes from solving $\boldsymbol{d_{k}}$ in (3.4) and $\boldsymbol{{\lambda }^{QP}}$ in (3.3) at each iteration, and it will approximate the optimal solution $\left( \boldsymbol{{\gamma}_{*}},\boldsymbol{{\lambda }_{*}} \right)$ when the KKT conditions are satisfied. Since the osculating quadratic problem is much easier to solve, the idea behind the SQP algorithm is to break down the complicated nonlinear constrained optimization problem into a series of QP problems and gradually reach the optimal solution.

____________________

Footnote

¹⁰ For any function $F\left( x \right)=0$ , the Newton method generates a sequence of $\left\{ {{x}_{k}} \right\}$ , where ${{x}_{k+1}}={{x}_{k}}+{{d}_{k}}$ , to find ${{x}_{*}}$ so that $F\left( x_{*} \right)=0$ . In each iteration, ${{d}_{k}}$ can be derived through the linearization of $F\left( x \right)$ , in which $F\left( {{x}_{k}} \right)+F'\left( {{x}_{k}} \right){{d}_{k}}=0$ .